Extra_options `disable_qkv_fusion` to untie qkv_projs from upstream choice by jixiongdeng · Pull Request #1893 · microsoft/onnxruntime-genai

jixiongdeng · 2025-11-25T22:46:46Z

Problem

As we discussed in this PR, I separate disable_qkv_fusion option as a new PR.

The current model builder ties q_proj, k_proj and v_proj together as qkv_proj by default, which is not controllable by upstream quantization choice.

Solution

Added disable_qkv_fusion in extra_options to override attention_attrs["use_packed_matmul"].

Running examples:

untied qvk_projs for 4 bit rtn on Llama-3.2-3B-Instruct:

python src/python/py/models/builder.py -m meta-llama/Llama-3.2-3B-Instruct -p int4 -e cuda -o export_model/llama32_3bi_rtn_u4_untied_qkv --extra_options int4_algo_config=rtn disable_qkv_fusion=true

Changes

Modified Files

src/python/py/models/builder.py
src/python/py/models/builders/base.py
src/python/py/models/README.MD

Key Modifications

Added disable_qkv_fusion as a part of assigning logic of attention_attrs["use_packed_matmul"].
Added documents.

jixiongdeng · 2025-11-26T20:14:33Z

Rebased main & resolved conflict. Thank you! @kunal-vaishnavi

…hoice (#1893) ## Problem As we discussed in [this PR](#1885), I separate `disable_qkv_fusion` option as a new PR. The current model builder ties q_proj, k_proj and v_proj together as qkv_proj by default, which is not controllable by upstream quantization choice. ## Solution Added `disable_qkv_fusion` in extra_options to override `attention_attrs["use_packed_matmul"]`. Running examples: **untied qvk_projs for 4 bit rtn on Llama-3.2-3B-Instruct**: ``` python src/python/py/models/builder.py -m meta-llama/Llama-3.2-3B-Instruct -p int4 -e cuda -o export_model/llama32_3bi_rtn_u4_untied_qkv --extra_options int4_algo_config=rtn disable_qkv_fusion=true ``` ## Changes ### Modified Files - `src/python/py/models/builder.py` - `src/python/py/models/builders/base.py` - `src/python/py/models/README.MD` ### Key Modifications 1. Added `disable_qkv_fusion` as a part of assigning logic of `attention_attrs["use_packed_matmul"]`. 2. Added documents.

jixiongdeng requested review from chenfucn, kunal-vaishnavi and tianleiwu November 25, 2025 22:46

kunal-vaishnavi previously approved these changes Nov 26, 2025

View reviewed changes

jixiongdeng added 2 commits November 26, 2025 18:14

Added disable_qkv_fusion in extra_options&Updated README.MD

d94128f

Updated builder bool list&Lint

9658ae7

jixiongdeng dismissed kunal-vaishnavi’s stale review via 9658ae7 November 26, 2025 18:26

jixiongdeng force-pushed the jd/disable_fuse_qkv branch from e4ebe46 to 9658ae7 Compare November 26, 2025 18:26

tianleiwu approved these changes Nov 26, 2025

View reviewed changes

tianleiwu merged commit 4f37298 into main Nov 26, 2025
15 checks passed

tianleiwu deleted the jd/disable_fuse_qkv branch November 26, 2025 21:55

dependabot bot mentioned this pull request Dec 15, 2025

Bump Microsoft.ML.OnnxRuntimeGenAI from 0.11.2 to 0.11.4 yuniko-software/qwen3-onnx#10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extra_options `disable_qkv_fusion` to untie qkv_projs from upstream choice#1893

Extra_options `disable_qkv_fusion` to untie qkv_projs from upstream choice#1893
tianleiwu merged 2 commits intomainfrom
jd/disable_fuse_qkv

jixiongdeng commented Nov 25, 2025 •

edited

Loading

Uh oh!

jixiongdeng commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jixiongdeng commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Modified Files

Key Modifications

Uh oh!

jixiongdeng commented Nov 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jixiongdeng commented Nov 25, 2025 •

edited

Loading